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DETAILED ACTION 
Claim Rejections - 35 USC § 102 

1 . The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that 
form the basis for the rejections under this section made in this Office action: 

A person shall be entitled to a patent unless - (b) the invention was patented or described in a printed 
publication in this or a foreign country or in public use or on sale in this country, more than one year prior to 
the date of application for patent in the United States. 

(e) the invention was described in (1) an application for patent, published under section 122(b), by 
another filed in the United States before the invention by the applicant for patent or (2) a patent 
granted on an application for patent by another filed in the United States before the invention by the 
applicant for patent, except that an international application filed under the treaty defined in section 
351(a) shall have the effects for purposes of this subsection of an application filed in the United States 
only if the international application designated the United States and was published under Article 21(2) 
of such treaty in the English language. 

2. Claims 1 , 3, and 5-6 are rejected under 35 U.S.C. 102(b) as being anticipated by 
Eatwell (US Patent No. 5742694). 

3. Regarding claims 1 , 3, and 5, Eatwell discloses a method of encoding an audio 
signal, an audio encoder and system, comprising the steps of: 

determining basic waveforms in the audio signal (Predictable Component 3 in 
figure 4); 

obtaining a noise component from the audio signal by subtracting the basic 
waveforms from the audio signal (Prediction Error 4 in figure 4); 

modeling a spectrum of the noise component by determining auto- 
regressive and moving-average parameters (element 42 in figures 3 or 5, or referring to 
col. 8, In. 22-47, where filter coefficients are determined); and 

including the auto-regressive and the moving-average parameters 
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and waveform parameters representing the basic waveforms in an encoded audio 
signal {Output signal 8 in figures 3 or 5). 

4. Regarding claim 6, Eatwell discloses an encoded audio signal comprising: 
waveform parameters representing basic waveforms {Predictable Component 3 

in figure 4); and 

auto-regressive parameters and moving-average parameters representing a 
spectrum of a remaining noise component {element 42 in figures 3 or 5 or referring to 
col. 8, In. 22-47). 

5. Claims 1-6 are rejected under 35 U.S.C. 102(e) as being anticipated by Miseki et 
al. (US Patent No. 6167375). 

6. Regarding claims 1 , 3, and 5, Miseki et al. disclose a method of encoding an 
audio signal, an audio encoder and system, comprising the steps of: 

determining basic waveforms in the audio signal {Predictor 547 in figure 18); 

obtaining a noise component from the audio signal by subtracting the basic 
waveforms from the audio signal {Output of the summer 543 in figure 18); 

modeling a spectrum of the noise component by determining auto- 
regressive and moving-average parameters {col. 8, In. 22-47, where filter coefficients 
are determined); and 

including the auto-regressive and the moving-average parameters 
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and waveform parameters representing the basic waveforms in an encoded audio 
signal {Output signal 513 in figure 16). 

7. Regarding claim 6, Miseki et al. discloses an encoded audio signal comprising: 
waveform parameters representing basic waveforms {Predictor 547 in figure 18); 

and auto-regressive parameters and moving-average parameters representing a 
spectrum of a remaining noise component {col. 8, In. 22-47, filter coefficients are 
determined). 

8. Regarding claims 2, 4, and 5, Miseki et al. disclose a method of decoding an 
encoded audio signal, an audio player and system, comprising the steps of: 

receiving an encoded audio signal comprising waveform parameters 
representing basic waveforms and auto-regressive and moving-average parameters 
representing a spectrum of a remaining noise component {col. 23, In. 1-35 together with 
figure 20, ARMA parameters are transmitted to the decoder side for use); 

filtering a white noise signal to obtain a reconstructed noise component, which 
filtering is determined by the auto-regressive parameters and the moving-average 
parameters {Noise Decoder 290 in figure 23); 

synthesizing basic waveforms based on the waveform parameters {Speech 
Decoder 280 in figure 23); and 

adding the reconstructed noise component to the synthesized basic waveforms 
to obtain a decoded audio signal {Mixer 295 in figure 23). 
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Claim Rejections - 35 USC § 103 

9. The following is a quotation of 35 U.S.C. 1 03(a) which forms the basis for all 
obviousness rejections set forth in this Office action: 

(a) A patent may not be obtained though the invention is not identically disclosed or described as set 
forth in section 102 of this title, if the differences between the subject matter sought to be patented and 
the prior art are such that the subject matter as a whole would have been obvious at the time the 
invention was made to a person having ordinary skill in the art to which said subject matter pertains. 
Patentability shall not be negatived by the manner in which the invention was made. 

10. Claim 7 is rejected under 35 U.S.C. 103(a) as being unpatentable over Eatwell 
(US Patent No. 5742694). 

1 1 . Regarding claim 7, Eatwell fails to disclose a storage medium on which an 
encoded audio signal as claimed in claim 6 is stored. However, it would have been 
obvious to one skilled in the art at the time of invention to implement the method in 
claim 6 in computer codes to facilitate maintenance and updating. 

12. Claim 7 is rejected under 35 U.S.C. 103(a) as being unpatentable over Miseki et 
al. (US Patent No. 6167375). 

13. Regarding claim 7, Miseki et al. fail to disclose a storage medium on which an 
encoded audio signal as claimed in claim 6 is stored. However, it would have been 
obvious to one skilled in the art at the time of invention to implement the method in 
claim 6 in computer codes to facilitate maintenance and updating. 
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Conclusion 



The prior art made of record and not relied upon is considered pertinent to 
applicant's disclosure. Akamine et al. (IEEE Publication) teach an ARMA model based 
speech coding scheme that is considered pertinent to the claimed invention. 

Any inquiry concerning this communication or earlier communications from the 
examiner should be directed to Huyen Vo whose telephone number is 703-305-8665. 
The examiner can normally be reached on M-F, 9-5:30. 

If attempts to reach the examiner by telephone are unsuccessful, the examiner's 
supervisor, Doris To can be reached on 703-305-4827. The fax phone number for the 
organization where this application or proceeding is assigned is 703-872-9306. 

Information regarding the status of an application may be obtained from the 
Patent Application Information Retrieval (PAIR) system. Status information for 
published applications may be obtained from either Private PAIR or Public PAIR. 
Status information for unpublished applications is available through Private PAIR only. 
For more information about the PAIR system, see http://pair-direct.uspto.gov. Should 
you have questions on access to the Private PAIR system, contact the Electronic 
Business Center (EBC) at 866-217-9197 (toll-free). 

Examiner Huyen X. Vo November 22, 2004 
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ARM A MODEL BASED SPEECH CODING AT8KB/S 



Miami AKAMENE tnd Kimk> MISEKI 



Video Syi. A Tech. Ub.. ToAIba RAD Center 
Saiwal-ku, Kiwinki-thJ, Jtptn 



ABSTRACT 

This paper proposes a new speech coding for 
high quality performance at 8 kbps. The coder is 
based on an ABHA aodel as the prediction filter 
and a new excitation aodel. A simple ARMA 
analysis net bod is proposed. This analysis 
method features a technique for eliminating the 
fine harmonic structure within the speech 
spectrum. This overcomes the problem of HA 
parameter mlsestimatlon. The excitation signal 
is modeled as a pulse train whose density is 
varied depending on the residual signal's power. 
The proposed coder produced high quality speech 
comparable with 6-bit Jog PCM at 8 kbps 
according to computer simulation. 



1. INTRODUCTION 

Various speech coding methods at bit rate 
below 10 kbps have been proposed for 
applications to private comunicatlon networks 
and mobile systems. A class of methods, adaptive 
predictive coding (APC) [1)[2]. multi pulse 
coding (MPC) [3] [4], and code excited linear 
prediction coding (CELP) [5] seems to be 
promising for these applications. This class of 
methods represents a speech spectral envelope as 
an all pole aodel. in other words, an AR model. 
However there are not only poles but also zeros 
in speech spectra. Especially, nasal and 
consonant sounds tend to have spectral zeros. 
The AR model is less satisfactory in describing 
these zeros accurately. An ABHA aodel can be 
attractive for improving speech quality in low 
bit rate speech coding because of its ability to 
describe both spectral poles and zeros 
efficiently. However, the ARHA model has rarely 
been used in speech coding so far. because the 
resulting estimation problem in an ARHA based 
system is a non- linear problem. 

Ishlzakl proposed a slaple ARHA analysis 
method [6). In the method, the AR and HA 
parameters are obtained separately. The AR 
parameters are determined by LPC analysis using 
the autocorrelation method. The HA parameters 
are obtained by inverting the power spectrum of 
the AR model's residual signal, and applying LPC 
analysis to the corresponding autocorrelation 
function. This ARHA estimation method requires 
LPC analysis and PPT. Therefore, the method has 
low computational complexity. But. the method 
often fails to estimate the HA parameters 
because of the fine harmonic structure within 
speech spectra. Thus. a technique for 
eliminating the fine harmonic structure in the 
frequency domain and the tine domain has been 
considered here. 



Input Signal 
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Block diagram of the proposed 
coding algorithm. 



The authors propose a new ARHA model based 
speech coder. The proposed coder includes a 
pitch filter, an ARHA filter, and an excitation 
signal model. The excitation signal Is modeled 
as a pulse train whose density Is varied 
depending on the residual signal's power. The 
ARHA residual signal is divided into several 
subframe signals in the time domain. The 
excitation signal is represented by closely 
spaced pulses in subframes with large residual 
power. while the excitation _ signal is 
represented by more widely spaced pulses in the 
subframes with low residual power. Each 
subframe' s excitation is determined analytically 
to minimize the perceptually weighted errors 
between original and synthetic signals. 

2. CODING ALGORITHM 

Figure 1 shows a block diagram of the 
proposed coder. The coder consists of two parts. 
The first part includes a pitch filter and an 
ARHA filter. The second part la an excitation 
signal generator. The pitch filter generates the 
pitch periodicity of voiced speeches. The ARHA 
filter restores the spectral envelope. The 
excitation signal is modeled as adaptive density 
pulses (ADP). Its amplitude is quantized with a 
vector quantizer. The pitch parameters, the ARHA 
parameters and the ADP's density and phase are 
transmitted to the receiver as side information. 
The coder will be called ARHA-ADP in this paper. 

A. ARMA ANALYSIS 

The proposed analysis method is depicted in 
Pig. 2. In this method, the AR and HA parameters 
are estimated separately. The well known LPC 
analysis method is used for estimating the AR 
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Fig. 2 Block dicgrom of the proposed ARMA onotytis method. 



parameters. After converting spectral zeros to 
poles, LPC analysis Is applied to obtain the HA 
paraaeters. Zero-to-pole conversion is 
accomplished as follows. First, the pernor 
spectrum of the AR residual signal is calculated 
by PPT. Then the power spectrua is filtered Id 
the frequency domain to remove its spectral 
haraonic structure. This filtering is equivalent 
to spectrua smoothing. The proposed ARMA 
analysis has a feature In the method for 
reaovlng the haraonic structure. If we fail to 
reserve it. the HA paraaeters are aisest lasted . 
because the deep valleys of the harmonic 
structure are changed to acute peaks in the zero- 
to-pole conversion. 

The authors considered the ways for 
eliminating the fine haraonic structure. Voiced 
speech waveforms are quasi periodic, not 
strictly periodic. However, for the sake of 
siaple explanation, voiced speech is assuaed to 
be strictly periodic signal with a pitch period 
Tp[s]. Then, the speech spectra are frequency- 
discrete. Considering the Fourier transform of 
the discrete spectra, the Fourier representation 
is a periodic function with a period Tp. A 
continuous spectrua can be obtained by using an 
ideal lowpass filter with a Tp/2 bandwidth. 
Therefore, the frequency characteristic of the 
filter aust change adapt ively depending on the 
pitch period. Also, the filter must have a zero- 
phase characteristic. Spectrua zeros will be 
shifted unless the filter Is zero-phase. It 
should be noticed that a filter closely 
approximating an ideal lowpass filter is 
computationally expensive because the resulting 
filter Is of a high order. A low order filter 
causes less distortion in the spectral envelope 
because the autocorrelation function r(k) of the 
AR residual signal decreases rapidly with k. 

Thus. in order to remove the fine 
structure, a first -order IIR filter was used 
whose coefficient is adapt ively varied so that 
the filter's time constant is proportional to 
the interval of the pitch harmonics. Let N be 
the order of FFT and T [sample] the pitch 
period. The filter's coefficient a is described 
es a function of the pitch period as follows. 

a - exp (Nb/T) (1) 

where b is a constant which is experimentally 
deteralned . 

Filtering is accomplished toward two directions, 
forward and backward as follows. 

D f (k)-D(k) «- a D f (k-1) k-1,2, N (2) 

D^CkWKk) ♦ a D b (kO) k-N.N-l,---, 1 (3) 

D(k)-lD f (k) ♦ D^k)] /2 k-1,2,---, N (4) 

where D f (k). D^k). D(k) and T>(k) are the 
forward filtered spectrum, the backward filtered 



spectrua. the power spectrum of the AR residual 
signal and the smoothed power spectrua. 
respectively. This makes the filter's phase 
characteristic zero-phase. 

The operations described above are done In 
frequency domain. The power spectrum smoothing 
and the zero-to-pole conversion can also be 
carried out in the time domain. Spectrum 
smoothing is carried out by windowing the 
corresponding autocorrelation function with a 
rectangular window. Zero-to-pole conversion Is 
performed as follows. Let D r (k) be the 
reciprocal of the smoothed power spectrua T5(k). 
Then 

D(k)-D r (k)-1 k-1,2, ••, N. (5) 

By the inverse Fourier transform of the above 
equation, we can obtain 
N-1 

E t(k)r'(n-k).6(n) n-0,1, N-1 (6) 

k-0 

where r(k) and r'(k) is the corresponding 
autocorrelation function of U"(k) and Dr<k). 
respectively, and 6(n) Is the Kronecker delta 
function. This equation is rewritten as follows. 



rCO) 
r<1) 


r(1) 
r(0) 


r(N-1>" 
r<N-2) 




" r'(0)~ 
r'<1) 




1 

0 


r(N-1) 


r(N-2) 


r(0) 








_0_ 



The autocorrelation function r'(k) is obtained 
by resolving the above equation. 

Comparing the operation in the time domain 
with the operation in the frequency domain from 
the viewpoint of computatlnal complexity, the 
former is expensive. The computational 
complexity of the tiae domain operation is 
0(N ). while the computational complexity of the 
frequency domain operation is 0(Nlog 2 N). What is 
worse that high precision is required to resolve 
equation (7). 

Therefore, the frequency domain method was used. 

B. ADAPTIVE DENSITY PULSE MODEL 

The authors propose adaptive density pulses 
(ADP) as the synthesis filter's excitation 
signal. An ADP is a pulse train which is located 
with a constant interval, namely, with a 
constant density in the sub frame, but the 
density is different subframe by subframe. The 
ARMA residual signal Is divided into several 
subframe signals in a coding frame. The ADP's 
density is set high when the subframe' s power is 
high, while the ADP's density is set low when 
the subframe* s power is low. The amplitudes of 
the ADP are analytically deterainded to ■ininize 
the perceptually weighted errors between the 
original and synthetic speech signals. 
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Let N be the coding frame length. L the 
subfreae length, and M the number of subfraass 
In the coding fraae. The a'th subfraae's 
excitation signal V (B> (n) is described as 
follows. 

V (D,) (n) - g i (n) 6(n-(i-t)D ni -K n } (8) 

i-1 1 

n-1,2, L 

' £ i ° B 

where D >t Q^. K^. and 8j*"* denotes the 
interval, number, phase, and amplitude of the 
ADP in the a'th subfreae . respectively. 
Superscript ( a )denotes that the sequence with 
the superscript is defined only in the a'th 
subframe. Tiae (n) is defined so as to be reset 
to one at each subfraae's beginning. An example 
of ADP is shown in Pig. 3. The synthesized signal 
Y*"*(n) is represented by the convolution 
between the synthesis filter's impulse response 
h(n> and the excitation signal V ( "'(n). 

Y (m) (n) - I V (0,) (j>h<n-j> 

„ T gi^Mn-Ci-DDtt-Ka) (9) 
i-1 

On the condition that and 0^ are given, the 
amplitudes gj*"* and phase K n of the a'th 
subframe' s ADP are determined so as to minimize 
the total squared error 

E <0,) - I [S, W W - Y w (n) (n)}\ (10) 
n-1 

where S *"*(n) denotes the weighted speech 
signal 'after subtracting the contributions 
carried over froa the previous coding fraae and 
subfraaes. and Y H * n \n) denotes the weighted 
synthetic signal. After differentiating E in 
terms of the amplitudes g^"' (1-1.2. ■•• .0^). 
let them be seros. Then the following equations 
hold. 

^Si^RhhU.-l) - R sh (n) (J) (ID 



r coding frame N H 

J*- subframe 1—4— subframe 2 -4— subframe 3-*f 
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i> 1 i 
J* \4 1 ! 








• 1 j < . 11 » 1 j 

1 1 



! (dense) ! (sparse) ! (sparse) j 

D|" 1 02 a 4 Oj-4 

Q t -16 Q 2 »4 0 3 -4 

K," 1 K 2 -4 K 5 -2 



Fig. 3 An example of ADP with N=48 and L-16. 



I-(i-1)0^4> t^, J-(j-1)D B * Kn 

1 < i, j < Qb 

where Ryjd.J) and "gj/^CJ) represents 

E hyCo-Db^U-J) 
n-l 

and 

L 

* ^"'(n^n-J). 
n-l 

respectively. The sequence h N (n)denotes the 
weighted synthesis filter's impulse response. 
The D m sets of candidates for the ADP's 
amplitudes are given by solving the equations 
(11) in terms of g 1 <a> for - 1.2. ••■ . D^. 
The Dm kinds of minimum squared errors of B ,* 
E^^'^CK^). are represented by substituting 
Eq. (11) into Eq.(10). 

E D in (m)(k -> ' =lSw (B) («» a - y Si (0> B 8h (0) (I) 
n° 1 i»l 

I - (i-DD^ ♦ ^ (12) 

Therefore the optiaua phase of the ADP can be 
set to such a that maximizes the second 
suaaatlon on the right side of Eq.(12). The 
optimum amplitudes are chosen among the sets 
of candidates froa the optimum phase K^. The 
perceptual weighting filter Is represented by 
the ARNA synthesis filter H(Z) in the z 
transform domain by 

V(Z) - H(rZ)/H(Z) (13) 
where 0<r<l.. 



3. EXPERIMENTAL RESULTS 

The Inverse of the spectral flatness 
measure (sfa) [7] of the ARNA residual signals 
was calculated to evaluate the proposed ARNA 
analysis by coaputer simulations. The inverse of 
the sfo is a measure of wavefroa predictability 
[7]. Table 1 shows a segmental sfa' 1 for the 
conventional ARNA analysis method (Method 0) and 
the proposed two methods, which are aethods with 
means for eliminating the fine harmonic 
structure in the frequency domain (Method 1) and 
in the time domain (Method 2). The sfm -1 seg is 
an averaged inverse of the sfm in dB units. 
Speech samples used for the simulations were one 
short Japanese sentence utterd by two aale and 
two female speakers. The analysis frame length, 
the AR order, and the HA order, was 256 saaples. 
B. and 4. respectively. Proa Table 1. It can be 
seen that the conventional method was improved 
by the proposed aethods. 



Tools 1 


Sfm^g of the ARMA resktaot 


signal. 




Mole 1 Mote 2 Femole 1 


Female 2 


Method 0 


8.64 5.65 9.77 


8. 14 


Method 1 


5.07 4.08 5.97 


4.79 


Method 2 


5.33 3.96 6.14 


5.00 



150 



Computer simulations were conducted to 
evaluate the proposed ARMA-ADP coder's 
performance. Table 2 slums the coding parameters 
and the bit allocation for them. The coding 
frame length and subframe length were set to 240 
and 40 samples at an 8 kHz sampling rate, 
respectively. Two kinds of densities for ADP 
ware used, that is, the pulse Interval was two 
samples for dense ADPs and four samples for 
sparse ADPs. The number of subframes where ADP 
was dense was two per coding frame. The AB 
parameters were quantized after being converted 
to a log-area ratio. The MA parameters were 
directly quantized. ADP was quantized by a 
vector quantizer (VQ). The code book was designed 
with 30000 training vectors generated from real 
speech samples using the LBG algorithm. The 
speech samples used for subjective and objective 
tests were the same ones mentioned above. They 
were different from the speech samples used for 
the VQ' 9 design. 

Table 3 shows the segmental SHE for the 
proposed coder at 8 kbps. The synthetic speeches 
bad high segmental SHRs. and they were high- 
quality and comparable with 6 -bit log PCM. 



Table 2 Coding parameters. 



Bit rate 


8 kbps 


Sampling rate 


8 kHz 


Frame length 




Analysis 


32 ms.(256 samples) 


Coding 


30 ms (240 samples) 


Pitch analysts 


1 st order 


Bit allocation 


12 bits 


ARMA analysis 


AR 8th , MA 4lh 


Bit allocation 


48 bits 


ADP 




Subframe length 


40 samples 


Pulse interval 


2 sample ( 2 subframe ) 




4 samplest 4 subfromes) 


VQ Dimension 7 


5 


Size 


10 birs 


Bit allocation 


180 bits 



Table 3 Coder's performance . 



Mole 1 Mojeg Female I Female 3 

SNRseg 15. 6 14. 3 15.7 15.3 
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4. CONCLUSION 

The authors have proposed an ARMA based 
speech coding method with a new excitation 
signal model. An improved ARMA analysis has been 
developed. This analysis is based on LPC 
analysis and spectral Inversion, in other words, 
zero to pole conversion, hence, it requires less 
computation, since the PPT algorithm and LPC 
analysis can be directly utilized. The MA 
parameter's misestlmatlons are avoided by 
eliminating the spectral harmonic structure 
within the AR residual signal. The excitation 
signal is modeled as a pulse train whose density 
is varied subframe by subframe depending on the 
residual signal's power. The amplitudes of the 
pulse train are analytically solved to minimize 
the perceptually weighted errors between the 
original and synthetic signals. The subjective 
speech quality of the coder was comparable to 
that of 6-bit log PCM from an informal listening 
test's result. 
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